Are Random Axioms Useful? 
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The famous Godel incompleteness theorem says that for every sufficiently rich formal theory 
(containing formal arithmetic in some natural sense) there exist true unprovable statements. Such 
statements would be natural candidates for being added as axioms, but where can we obtain them? 
One classical (and well studied) approach is to add (to some theory T) an axiom that claims the 
consistency of T . In this note we discuss the other one (motivated by Chaitin's version of the 
Godel theorem) and show that it is not really useful (in the sense that it does not help us to prove 
new interesting theorems, see below Proposition [2]). However, the situation could change if we 
take into account the size of the proofs. 

1 Godel theorem in Chaitin's version 

Kolmogorov complexity C (x) of a binary string x is defined as the minimal length of a program 
(without input) that outputs x and terminates. (This definition depends on a programming lan- 
guage, and one should choose one that makes complexity minimal up to 0(1) additive term.) 
Most strings of length n have complexity close to n. More precisely, the fraction of rc-bit strings 
that have complexity less than n — c, is at most 2~ c . In particular, there exist strings of arbitrary 
high complexity. 

However, as G. Chaitin pointed, the situation changes if we look for strings of provably high 
complexity. More precisely, we are looking for strings x and numbers n such that the statement 
"C (x) > n" (for these x and n, so it is a closed statement) is provable in formal arithmetic. Chaitin 
noted that all n that appear in these statements, are less than some constant c. (The argument is 
a version of the Berry paradox: assume for arbitrary integer k we can find some string x such 
that C(x) > k is provable; let Xk be the first string in the order of enumeration of all proofs; this 
definition provides a program of size 0{\o%k) that generates x^, which is impossible for large k if 
C(x k )>k.) 

Another proof of the same result shows that Kolmogorov complexity is not very essential 
here; by a standard fixed-point argument one can construct a program p (without input) such that 
for every program q (without input) the assumption that p is equivalent to q (produces the same 
output if p terminates, and does not terminate if p does not terminate) is consistent with formal 
arithmetic. If p has length k, for every x we may assume without contradiction that p produces x, 
so one cannot prove that C (x) > k. 

2 Incompressibility axioms 

This leads to a natural idea: take a random string x of length n and consider the statement C (x) > 
n — 1000. It is true unless we are extremely unlucky. The probability of being unlucky is less than 
2-iooo j n statistics and natural science we are accustomed to identify this with impossibility. So 
we can add this statement and be sure that it is true; if n is large enough, we get a true non-provable 
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statement and could use it as a new axiom. We can even repeat this procedure several times: if the 
number of iterations m is not astronomically large, 2~ 1000 m is still astronomically small. 

Now the question: can we obtain a richer theory in this way and get some interesting conse- 
quences, still being practically sure that they are true! 

The answer that we give (using a very simple argument): 

• yes, this is a safe way of enriching our theory (formal arithmetic), see Proposition [Q 

• yes, we can get a stronger theory in this way (Chaitin's theorem), but 

• no, we cannot prove anything interesting in this way, see Proposition [2] 

3 Random axioms: soundness 

Let us first prove that the chances to get a false statement in this way are small. In fact, we can 
allow even more flexible procedure of adding random axioms that does not mention Kolmogorov 
complexity explicitly. This procedure goes as follows: assume that we already have proved for 
some number (numeral) N, for some rational 8 > and for some property R(x) (with one free 
string variable x) that the number of strings x of length N such that ->R(x) does not exceed 82 N 
(this is a closed arithmetic formula). Then we are allowed to toss a fair coin N times, generating a 
string r of length N, and consider the formula R(r) as a new axiom. But we have to pay 8 for each 
operation of this type, and initially we have only some fixed amount £. Intuitively, £ measures the 
maximal probability that we agree to consider as "negligible". 

In this framework we shall speak not about proofs (as sequences of formulas constructed ac- 
cording to some rules like modus ponens etc.), but about proof strategies that say which steps 
should be made for every outcome (i.e., for every string r if proof strategy comes to randomized 
step). For a proof strategy it, formula F and initial capital £ we can measure the probability that 
it achieves (proves) F. The following proposition says that this procedure indeed can be trusted: 

Proposition 1 (soundness). If the probability to achieve F for a proof strategy % with initial 
capital £ is greater than e, then F is true. 

This proposition says that the probability of proving a false statement is bounded by £. More- 
over, the probability of the event "a false statement appears in the proof" is bounded by £. This is 
intuitively obvious because a false statement means that one of the bad events has happened, and 
the sum of probabilities is bounded by £. However, this argument cannot be understood literally 
since different branches have different bad events (after branching). Formally one should prove 
by backward induction that at every point of the tree where the remaining capital is y and there 
are still no false statements, the conditional probability to get a false statement assuming that we 
have reached this point, is bounded by y. (End of proof.) 

4 Conservative extension? 

As Chaitin theorem shows, there are proof strategies that with high probability lead to some state- 
ments that are non-provable (in formal arithmetic). However, the situation changes if we want 
to get some fixed statement, as the following proposition (which is a stronger version of Proposi- 
tion [B shows: 

Proposition 2 (conservation). If the probability to get F for a proof strategy % with initial capital 
£ is greater than e, then F is provable. 
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Let us start with a simple case where there is only one randomized step. So some formula R(x) 
is fixed, as well as some integer N, and it is provable (in formal arithmetic without extensions) 
that there are at most e2 N strings x of length N such that -<R(x). We denote this statement by (*) 
in the sequel. Let us say that a string x of length N is strong if the formula R(x) => F is provable 
in arithmetic. The assumption guarantees that the fraction of strong x among strings of length N 
is greater than £. Let x\ , . . . ,x t be all of them. Then R(x{) — >■ F is provable for each i, therefore 

R(x\) \ZR(x 2 ) V . . . WR(x t ) F 

is also provable. But the disjunction in the left hand side is provably weaker than (*), and the 
latter is provable according to the assumption. Therefore, F is also provable. 

In the general case we need to be a bit more careful. We say that a vertex u in a proof strategy 
tree is strong if F provably follows from the statements accepted up to that point (on the way 
to u) in the proof process. For every vertex u in the proof tree we also consider the remaining 
capital p(u), and the probability p(u) to finish the proof successfully starting from that point 
(=the conditional probability of successful proof under the condition that it goes through u). Then 
we prove the following statement using backward induction (from leaves to the root) over u: if 
p{u) > p(u), then the vertex u is strong. For the tree root this is the statement of the Proposition; 
for leaves the value of p(u) is either or 1, and if it is 1, then u is strong. 

The induction step uses the same argument as in the previous paragraph. Consider some vertex 
u of the proof tree. Let 8{u) be the threshold used for branching at u. If the fraction of strong 
vertices among the sons of u is greater than 8(u), then the vertex u itself is strong (since the 
disjunction of statements R(x) — >■ F over all x that correspond to strong sons, is provable in u). So 
let us assume that this fraction does not exceed 8{u). Then at u the random choice will lead to a 
strong vertex with probability at most 8(u). And if this is not the case, the chances of finishing 
the proof successfully (for each weak son) do not exceed the remaining capital p(u) — 8(u) by 
induction assumption. So the total probability of finishing the proof successfully starting at u does 
not exceed p(u). (End of proof.) 

5 Polynomial size proofs 

The situation changes drastically if we are interested in the length of proofs. The argument used 
in Proposition |2] gives an exponentially long proof compared with the original probabilistic proof 
(since we need to combine the proofs for all terms in the disjunction). (Here the complexity of a 
probabilistic proof is measured as the total length of formulas in the branch where this total length 
is maximal.) May be one can find another construction that shows that our "probabilistic" proofs 
can be transformed into a normal ones with only polynomial increase in size? Some reason why 
this should be hard is provided by the following Proposition |3] 

Proposition 3. Assume that such a polynomially bounded transformation is possible. Then com- 
plexity classes PSPACE and NP coincide. 

It is enough to consider standard PSPACE-complete language, the language TQBF of true 
quantified Boolean formulas. A standard interactive proof for this language (see, e.g., 0) uses 
Arthur-Merlin scheme where the verifier ("Arthur") is able to perform polynomial- size computa- 
tions and obtain random bits (that cannot be corrupted by Merlin). The correctness of this scheme 
is based on simple properties of finite fields. This kind of proof can be easily transformed into a 
successful probabilistic proof strategy in our sense. To explain this transformation, let us recall 
some details of the construction. Assume that a quantified Boolean formula F starting with an 
universal quantifier is given. This formula is transformed into a statement that says that for some 
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polynomial P(x) two values P(0) and P(l) are equal to 1. This polynomial is implicitly defined 
by a sequence of operations. To convince Arthur that it is indeed the case, Merlin shows Arthur 
this polynomial (listing explicitly its coefficients; there are polynomially many of them). In other 
terms, Merlin notes that the formula 

[Vx(P(x) = P(x))] => [P(0) = 1 AP(1) = 1] 

and therefore 

[Vx(P(x) = P(x))] 

is true (and provable in arithmetic). Here P(x) is the polynomial P defined as the result of the 
sequence of operations, while P(x) is the explicitly given expression for P. Then Merlin notes that 
the implication 

[P(r) = P(r)} Vx(P(x) = P(x)) 

is true for most elements r of the finite field and this fact is provable (using basic results about 
finite fields), so such an implication can be added as a new axiom. Continuing in this way, we get 
a probabilistic proof strategy that mimics Merlin's behavior in the IP-protocol for TQBF. 

This strategy gives probabilistic proofs of polynomial length (assuming our proof system is 
not unreasonable inefficient). If they could be transformed into non-probabilistic proofs (and the 
underlying theory is consistent), these non-probabilistic proofs would be NP-witnesses for TQBF, 
and therefore PSPACE=NP. (End of proof.) 

6 Adding full information about complexities 

Let us be more generous and add to formal arithmetic full information about Kolmogorov com- 
plexity of all strings; for each string x we add an axiom "C (x) = k" where k is the literal represent- 
ing Kolmogorov complexity of x. (Of course, this procedure is non-constructive.) What theory do 
we get? It turns out that this theory can be easily described in logical terms: 

Proposition 4. This theory can be obtained from formal arithmetic by adding all true universal 
statements. 

First, note that the statements C (x) > n (for some string x and number n) are universal state- 
ments (all programs of length at most n do not produce x). The statements of the form C(x) <n 
are existential and therefore are provable when true. So adding all true universal statements is 
enough to get full information about complexities. 

More interesting is the another direction. Now we assume that we have all the information 
about Kolmogorov complexities; we need to show that every true universal statement becomes 
provable. This can be done by an argument which is a logic translation of the Turing completeness 
result for the set of pairs (x, n) such that C (x) < n. Here is it. 

The Kolmogorov complexity function is enumerable from above: simulating in parallel all the 
programs, we get an upper bound for C, and this upper bound decreases with time. Let T n be the 
moment when this upper bound reaches its final value (i.e., the true value of C) for all strings of 
length at most n. 

Note that C(T) >n — O (log n) for every T > T n . Indeed, knowing n and T, one can find the 
true values of C for all strings of length n, and choose the first one that has complexity at least n 
(it always exists). So n and T are enough to construct a string of complexity n, and the complexity 
of n is only logn. 

Another simple observation: every program of length n — O(logn) either terminates after T n 
steps or does not terminate at all. Indeed, each terminating program may be considered as a 
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description of the number of steps it takes to terminate, so the complexity of this number does not 
exceed the length of the program, and it remains to use the previous statement. 

Both observation can be formalized as provable statements of formal arithmetic. If we have (as 
additional axioms) full information about Kolmogorov complexities of all strings, then for each 
n one can prove a statement T n = t for some numeral t. Having such a statement, we can then 
prove non-termination of every non-terminating program of size at most n — 0{\ogn). And this is 
enough to prove every true universal statement, since it can be reformulated as non-termination of 
some program. (End of proof.) 

Remark. A sceptic could (rightfully) claim that the exact value of Kolmogorov complexity 
can encode some additional irrelevant information (in particular, about universal statements' truth 
values). For example, it is not obvious a priori that the theory considered in Proposition!?] does 
not depend on the choice of optimal programming language fixed in the definition of Kolmogorov 
complexity. Also one may ask whether we consider plain or prefix complexity. 

To address all these questions, let us consider a smaller set of axioms and assume that for every 
x the complexity of x is guaranteed up to a factor of 2, i.e., some axiom C (x) > c x is added where 
c x is at least half of the true complexity of x. (We do not assume anything else about the choice 
of c x ; this choice cannot be done computably for obvious reasons.) Still the argument works 
with minor modifications. Let C l {x) be the upper bound for complexity obtained if we restrict 
computation time by t. Let x(n,t) be the string of length at most n that makes C f (x) maximal 
(among all x of length at most n, for given t). Then C f (x(n, t)) is at least n though C (jc(«, t)) can 
be much smaller (for small t). By T n let us denote the minimal T such that C n (x) <2C(x) for all 
strings x of length at most n. The following statement is true for every t and n: 

t>T n ^ C(x(n,t)) > n/2. 

Indeed, C'(x(n,t)) > n by definition, and if t > T n , this guarantees C(x(n,t)) > n/2. Since x(-, ■) 
is computable, C(x(n,t)) < C(t) + 0(logn). Therefore, 

t >T n ^ C(t) >n/2-0(logn). 

Again, this implies that every program of size at most n/2 — O(logn) that does not terminate in 
T n steps, does not terminate at all. This is not only true, but also provable (with natural formal 
representation of all involved notions). And, having our additional axioms, we are able (for each 
n) find some provable (using the axioms) upper bound for T n , and therefore prove non-termination 
for every non-terminating program of size at most n/2 — O(logrc). 

Question: What happens if we add only some axioms about complexities? Is it possible, for 
example, to add axioms C(x) >n — 0(l) for most of the strings x of length n in such a way that a 
statement (p (fixed in advance) remains unprovable? 
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